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\ Abstract 

Q | In these lectures I will present an introduction to the results that have been recently 

' obtained in constraint optimization of random problems using statistical mechanics 

techniques. After presenting the general results, in order to simplify the presentation 
', I will describe in details only the problems related to the coloring of a random graph. 

q ; 1 Introduction 

Statistical mechanics of disorder systems and constraint optimization of random problems 
have many point in common. In these lectures I will describe some of the results that have 
been obtained by applying the techniques developed in the study of disordered systems to 
O ' optimization theory. The aim of these studies is twofold: 

• Statistical mechanics techniques are very sophisticated and powerful: using them it is 
possible to obtain very relevant heuristic and eventually exact results for optimization 
""^5 \ theory. Better algorithms can also be found. 



x 



• Applying the techniques of statistical mechanics in a rather different setting needs the 
invention of new theoretical tools that can be carried back to the study of systems of 
physical interest. 

This cross fertilization process between different fields usually produces very interesting 
results. 

These lectures are organized as follows. In the next section I will present some general 
considerations on the relation between optimization theory and statistical mechanics. In 
section III I will describe a well understood problem, i.e. bipartite matching, for which 
many analytic and numeric results are available. In the next section I will introduces some 
of the basic ideas and notations in constraint optimization and I will stress the relations 
with statistical mechanics and with the theory of phase transitions. In section V I will recall 
the main properties of random lattices and of the Bethe (cavity) approximation, that will 
be heavily used in the rest of these lectures. In the next section I will present the problem 
of graph coloring and I will derive the appropriate cavity equations in the colorable phase. 
In section VII I will sketch the survey method that allows us to compute in an (hopefully) 
exact way the phase transition between the colorable and the uncolorable phase. Finally in 
the last section I will present some conclusions and perspectives. 
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2 General considerations 



In statistical mechanics jTj one is interested to compute the partition function as function 
of the temperature, the partition function being defined as 

= $> X p(-/3tf(C)) , (1) 
c 

where (3 = l/(kT), T being the temperature and k being equal to two thirds of the 
Boltzmann-Drude constant. The variable C denotes a generic element of the configura- 
tion space. The quantity H(C) is the Hamiltonian of the system. Starting from Z(/3) we 
can reconstruct the other thermodynamic quantities. 

In optimization problems we are interested to find out the configuration C* (or the 
configurations, as far as the ground state may be degenerate) that minimizes the function 
H(C), that in this contest is the called the cost function. We are also interested in knowing 
the minimal cost, i.e. 

H* = H(C*) . (2) 

It is trivial to realize that optimization is a particular case of statistical mechanics 
Let us denote by E(T) and S(T) the expectation value of the energy and the entropy as 
function of the temperature T. One finds that only the zero temperature behaviour is 
relevant for optimization: 

H* = E(0) (3) 

and the number of optimizing configurations is just equal to exp(S'(0)). 

Often in optimization we are interested in knowing something more (e.g. their number) 
about the nearly optimizing configurations, i.e. those configurations such that H{C) = 
H* + e. This information may be gathered by studying the system at small temperatures. 

In both cases, statistical mechanics and optimization theory, we are interested to study 
what happens in the thermodynamic limit, i.e. when the number N of variables (the di- 
mensions of the configuration space) goes to infinity, or it is very large (finite size effects are 
more important in optimization theory than in statistical mechanics). 

While optimization theory could be considered only from an abstract point of view as the 
zero temperature limit of statistical mechanics, in reality it differs from statistical mechanics 
in many crucial aspects, e.g. goals, problems and techniques. Optimization theory is really 
a different science, with many points of contact with statistical mechanics. 

Optimization problems, where the Hamiltonian has a simple form, are not usually the 
mostly interesting (on the contrary an incredible large amount of work has been done in 
statistical mechanics on the Ising problems). In optimization theory it is natural to consider 
an ensemble of Hamiltonians and to find out the properties of a generic Hamiltonian that 
belongs to this ensemble. 

In other words we have an Hamiltonian that is characterized by a set of parameters 
(collectively denoted by J) and we have a probability distribution /x( J) on this parameter 
space. One would like to compute the ensemble average 1 , e.g. 

E AV (T) = J d^{J)Ej{T) = EAT) , (4) 

1 Sometimes the ensemble is defined in a loose way, e,g. problems that arise from practical instance, e.g. 
chips placements on a computer board, or register allocation 



2 



where the overline denote the average of the ensemble of Hamiltonians. Eventually we are 
going to set T = 0. This approach has been developed in statistical mechanics for studying 
systems with quenched disordered (e.g. spin glasses) 2 . Only the recent development of 
these techniques allow us to use statistical mechanics tools to study optimization theory. 

In general we are interested in computing the probability distribution P(E) of the zero 
temperature energy E over the ensemble: 

P(E) = 6(Ej(0) - E) . (5) 

In the thermodynamics limit, where the number N of variables goes to infinity, if E is 
well normalized in this limit, we expect that its probability distribution becomes a delta 
function, according to the general principle that its intensive quantity do not fluctuate in 
the thermodynamic limit. 

The interest of computer scientists is not limited to the computation of the ensemble 
average: they live in the real world and they are extremely interested to find an efficient 
algorithms to compute for a given Hj(C) (i.e. for an instance of the problem) the configu- 
ration C* that minimizes it. Of course the configuration C* can always be found (at least 
in the case of a finite configuration space) by computing H{C) for all possible choice of the 
configurations, but this algorithm is very very slow 3 . One would like to find out the most 
efficient algorithms and to correlate their performances to the properties of the ensemble of 
problems. 

A given algorithm can be tested by checking how it performs for different instances: 
for each problem there are well known benchmarks that correspond to different types of 
ensembles. We can also do a more theoretical analysis. For a given algorithm we can define 
the time tj as the time (i.e. the number of operations) it takes to find the ground state of 
the Hamiltonian Hj. We can define the average time (in a logarithmic scale) as: 

ln{t ln ) = \n(tj) . (6) 

The introduction of the logarithm is very important: very often the linearly averaged time 

t L = t] (7) 

is much greater than t^v because it may be dominated by rare configurations that give an 
extremely large contribution 4 . We can also define the most likely time, (£ml) a s the time 
where the probability distribution of the time has a maximum and the median time. 

In many cases the logarithmically averaged time, the median and the most likely times 
behave in a quite similar way when the number N of variables becomes large: in the best 
of the possible worlds 

\n(tj)/Htj) (8) 

does not fluctuate in the thermodynamics limit. 

On the contrary very often the worst case time, defined as 

t wc = max(tj) , (9) 

2 Many of the techniques developed for disordered systems can be used also for the study of structural 
disordered systems where no disorder is present in the Hamiltonian. 
3 This algorithm usually takes a time proportional to exp(AAT). 

4 This is the same argument that implies that the annealed free energy and the quenched free energy are 
very different: \n(Zj) is very different from ln(Zj). 
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is much larger that the other times [3J. The most likely time (or the logarithmically averaged 
time) is the most interesting, unfortunately the computation of the worst case time is the 
the most easy one from the theoretical point of view (in empirical studies, i,e. when testing 
the algorithm with real instances, the situation is just the opposite). 

It is rather difficult to study analytically the most likely time: if the algorithm is inter- 
preted as a physical system, we must study its time evolution toward equilibrium. In spite 
of the spectacular progresses that have been done in recent years, a general theory of the 
approach to equilibrium is lacking and this is one of the most difficult problem of statistical 
mechanics. Fortunately present days theory allow us to do some predictions jUE]. 

3 A well understood problem: bipartite matching 

Let us be specific and let us consider an example of optimization problem for which we have 
a good understanding. Let us consider the so called bipartite matching. 

The problem is defined as follows: we have N cities and N wells that must be connected 
one with the other: in other way there is one and only one well ir(i) that is connected 
(matches) to the city % 5 . In other words ir(i) is a permutations and the configuration space 
of all possible matching contains N\ elements. 

The Hamiltonian H<i(tt) is given by 

H d {*) = E d i,*(i) > ( 10 ) 

i=l,N 

where d^k is a N x N matrix expressing the cost of establishing a connection from % to k an 
it characterizes an instance of the problem. 

In spite of the fact that the configuration space contains N\ elements there are algorithms 
that for any d compute the minimum energy configuration C*(d) and the corresponding 
energy (E*(d)) in a time that is less that iV 3 multiplied by an an appropriate constant. The 
algorithmic simplicity of the model is related to the fact that it can be recast as a linear 
programming problem. Indeed we can introduce the N 2 variables n^u = 0, 1 such that 

i=l,JV k=l,N 

It is obviously that there is an one to one correspondence of these variables with the per- 
mutation and the Hamiltonian may be rewritten as 

H d(n) = E d^ k n^ k . (12) 

i,k=l,N 

Up to this point we have not gained too much. However we can enlarge the configura- 
tion space by considering real variables that satisfies both the constraint eq.tjllj) (i.e. 

Ei=i,jv s i,k = J2k=i,N s i,k = 1) and the bound 

< s hk < 1 . (13) 

5 In simple matching we have N cities that must be pair-wise matched: here the configuration space 
contains (N — 1)!! elements. 
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The problem is more general and it can be solved using linear programming tools in a very 
efficient way. The surprise is that the minimum of 

H d(s) = d i,kSi,k (14) 

i,k=l,N 

happens always when s^k = 0,1, as can be easily proved. In this way, by enlarging the 
configuration space, we are able to find a fast solutions of the original problem. 

From the algorithmic point of view the situation is quite good. A different set of questions 
arise if we consider the ensemble average of the value of H at the mimima. In this case we 
have to specify the ensemble by specifying the probability distribution of the matrix d. 

A well studied case is when the matrix elements d^k are independent identically dis- 
tributed (i.i.d.) variables: here we have to assign the probability of a single matrix element 
P(di t k). Simple results are obtained when 

P{di,k) = exp(-d it k) (15) 

i.e di^ = — ln^fe), where r^k is a random number, with flat probability distribution in the 
interval [0 — 1] . 

It was firstly conjectured [S| an then proved rigorously [7j that 

H{d) = E 4- (16) 

n=l,N 11 

The proof of this very simple result is a tour de force in combinatorial optimization [7] and 
it is rather unlikely that similar results are valid for different choices of the function P(d). 

The situation becomes simpler if we consider the case where N goes to infinity. Using 
the techniques of statistical mechanics S\ we can prove that 



lim H(d) = C(2) = - (17) 

for all the probability distributions such that P(0) = 1. More over if 

P{d) = 1 -Ad + 0{d 2 ) , (18) 

one finds that [HI H GDI 

W) = C(2) - 2(1 ~ ^ C(3) + 1 + 0(iV- 2 ) . (19) 

This result is in agreement with previous formula for the exponential probability where 
A — 1. The computation of the leading term when iV — > oo is nowadays a real mathematical 
theorem [TT] (that was obtained 15 years after the original result derived using the usual 
non rigorous methods of statistical mechanics), while the computation of the subleading 
term (the one proportional to 1/N) was only done using statistical mechanics techniques. 

The computation of the subleading term is a highly non-trivial computations, the first 
two attempts gave wrong results (a factor two j3| and a Jacobian [TU] were forgotten), 
however now we have the correct versions [10J. It has been verified with at least 3 decimal 
digits. The computation of the term —2(1 — A)((3)/N is rather simple. The real difficulty 
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is the term —1/N (in the second computation with the missing Jacobian this term was 
— 7r 2 / (12N)), however it correctness can be checked with the exact result for the exponential 
probability. 

A similar problem is the travelling salesman: here we have to find the least expensive tour 
that travels through all the cities (cities and wells are identified). The matrix d represent 
the cost for going from city i to k and it usually taken to be symmetric. The configuration 
space is restricted to permutations that contains only one cycle (i.e. cyclic permutations) 
and the Hamiltonian has always the same form (i.e. eq. ljlOj) ). 

It is amusing to note that, when we restrict the configuration space, the nature of the 
problem changes dramatically and it became much more difficult. The travelling salesman 
problem is NP complete, this statement implies (if a well known conjecture, worth one 
million dollars, is correct) that there is no algorithm that can find a solution of the travelling 
salesman problem in time that is bounded by a polynomial in TV in the worst case 6 . However 
it is empirically found (and I believe it has also been proved) that, in the typical case of 
random distribution of the elements of the matrix d, a standard simple algorithm takes a 
time that does not increase faster that iV 4 . This example shows that there is a dramatic 
difference among the time in the typical case and in the worst case. Unfortunately while it 
is relatively easy to measure the scaling of the typical time as function of N, the results for 
the worst cases are very difficult to be obtained numerically. 

These examples are particularly interesting because they tell us that the scenario may 
radically change by an apparently small changes in the definition of the problem. However 
from the point of view of analytic computations the results are only slightly more difficult 
in the traveling salesman problem. If -P(O) (defined in eq. Ijl5j) ) is equal to one, we find that 
in the large N limit 

~H{d) = 2.041. (20) 

Apparently there are no serious difficulties in computing the 1/N corrections to the trav- 
eling salesman problem and the 1/N 2 corrections to the bipartite matching, however the 
computations become rather long and they have not been done. 

Summarizing statistical mechanics techniques, based on the replica approach or the 
cavity approach, are able to provide the exact solution to these models in the thermodynamic 
limit: they can be used to compute also the leading finite size correction (that is a highly 
non-trivial test). 

These methods are working very well when there is no structure in the ensemble of 
Hamiltonians (i.e. the d's are independent identically distributed variables). A different 
situation may arise in other cases. For example let us define a three dimensional bipartite 
matching problem: we consider a cube of size 1, we extract randomly n points Xi and z/i 
inside the cube and we construct the matrix d using the Euclidean distance among the 
points: 

di,k = \%i - Vk\ ■ (21) 

In this way the distances are correlated random variables (e.g. they satisfy the triangular 
inequality) and the exact computation of the minimal energy state is at least as difficult as 
the computation of the free energy of the three dimensional Ising model. Both problems have 
a three dimensional structure that is absent in models where the distances are uncorrelated. 

6 NP does not means non-polynomial, but Polynomial on a Non-standard machine [3], e.g. a computer 
that has an unbounded number of nodes that work in parallel, however it is quite likely that this misinter- 
pretation of the name is not far from reality. 
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The situation where the distances are uncorrected plays the role of a mean field theory 
and the results in finite dimensions can be approximately obtained by doing a perturbative 
expansion around the uncorrelated mean field case ^2]- The results are rather good (there 
are still some points that are not so well understood JH|), however the description of these 
techniques would lead us too far from the main goal of these lectures. 

4 Constraint optimization 

4.1 General considerations 

Constraint optimization is a particular case of combinatorial optimizations and it will be 
the main subject of these lectures. Let us consider a simple case: a configuration of our 
system is composed by a number N of variables <7i that may take q values (e.g. from 1 to q). 
These values usually have some meaning in real life, but this does not necessarily concern 
us. A instance of the problems is characterized by M function /&[er] (= 1, M), each function 
takes only the values or 1 and depends of a small number of a variables. 
Let us consider the following example with N = 4 and M = 3: 

c x [a) = 0(a 1 a 2 - 3 4 ) , (22) 
c 2 [cr] = 0(ai(j 3 - ct 2 o- 4 ) , 
c 3 [a] = 0(010-4 - 0-20-3) , 

where the function 6(x) is 1 if the argument is non- negative and it is if the argument is 
negative. The function we want to minimize is 

H[a] = £ c k [a] . (23) 

k=l,M 

In particular we are interested to know if there is a minimum with H[a] = 0. If this 
happens all the function must be zero. It is quite evident that imposing the condition 
H[a] =0 is equivalent to finding the solution to the following inequalities: 

0304 > 0i0 2 , (24) 

2 04 > 0103 , 
0203 > 0104 • 

In other words each function imposes a constraint and the function H is zero if all the 
constraints are satisfied. If this happens, the minimal total energy is zero and we are in the 
satisfiable case. On the contrary, if not possible to satisfy all the constraints, the minimal 
total energy is different from zero and we stay in the unsatisfiable case. In this case the 
minimum value of H is the minimum number of constraints that have to be violated. It is 
clear that for each set of inequalities of the previous kind there are arithmetic techniques to 
find out if there is a solution and, if any, to count their number. 

4.2 The thermodynamic limit 

Given N and M we can easily define an ensemble as all the possible different set of M 
inequalities of the type 

o~i 1 (k)Vi 2 (k) > o-j 3 (fc)0"M(fc) • (25) 
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The interesting limit is when N goes to infinity with 



M = Na , (26) 

a being a parameter. Hand waving arguments suggest that for small a it is should be 
possible to satisfy all the constraints, while for very large a also in the best case most of 
the constraints will be not satisfied. It is believed that in this and in many other similar 
problems there is a phase transition from a satisfiable to a non-satisfiable phase ^2]. More 
precisely let us define the energy density 

e(a,N) = ^ (27) 

where the average is done over all possible constraints over the set of N variables. The 
results should depend on q but we have not indicated this dependence. 

Usual arguments of statistical mechanics imply that the following limit is well defined 

e{a) = lim e(a, N) (28) 

N— »oo 



and that sample to sample fluctuations vanish when N goes to infinity: 

, 2 

-> (29) 



(H*) 2 - (H*\' 



N 2 

According to the previous discussion there must be a phase transition at a critical value 
of a c , such that 

e{a) = for a <= a c , (30) 
e(a) > for a > a c . 

A simple argument shows that e(a) is a continuos differentiable function, so that the 
satisfiability-unsatisfiability transition cannot be a first order transition in the thermody- 
namic sense. 

The goal of the statistical mechanics approach is to compute a c and eventually e(a). We 
would like also to compute the entropy density, that is related to the number of zero energy 
configurations, but we will not address this point in this lecture. We will see that in order 
to compute a c we will need a generalization of the entropy, i.e. the complexity X(ct) that 
is the exact equivalent for this problem of the configurational entropy used in the study of 
glasses. 

This random inequality model has never been studied, as far as I know, however the 
computation of a c should be relatively straightforward, for not too large values of q. We 
will consider in the next section a much more studied problem: the coloring of a random 
graph with q different colors. 



5 An intermezzo on random graphs 

The definition of random graphs have been discussed in Havlin's lectures, however it is 
convenient to recall here the main properties 
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There are many variants of random graphs: fixed local coordination number, Poisson 
distributed local coordination number, bipartite graphs. . . They have the same main topo- 
logical structures in the limit where the number (N) of nodes goes to infinity. 

We start by defining the random Poisson graph in the following way: given N nodes 
we consider the ensemble of all possible graphs with M = aN edges (or links). A random 
Poisson graph is a generic element of this ensemble. 

The first quantity we can consider for a given graph is the local coordination number 
Zi, i.e. the number of nodes that are connected to the node i. The average coordination 
number z is the average over the graph of the zf 



In this case it is evident that 



It takes a little more work to show that in the thermodynamic limit (N — > oo), the prob- 
ability distribution of the local coordination number is a Poisson distribution with average 
z. 

In a similar construction two random points % and k are connected with a probability 
that is equal to z/(N — 1). Here it is trivial to show that the probability distribution 
of the Zi is Poisson, with average z. The total number of links is just zN/2, apart from 
corrections proportional to \^N. The two Poisson ensembles, i.e. fixed total number of links 
and fluctuating total number of links, cannot be distinguished locally for large N and most 
of the properties are the same. 

Random lattices with fixed coordination number z can be easily defined; the ensemble 
is just given by all the graphs with Zi = z and a random graph is just a generic element of 
this ensemble . 

One of the most important facts about these graphs is that they are locally a tree, i.e. 
they are locally cycleless. In other words, if we take a generic point % and we consider the 
subgraph composed by those points that are at a distance less than d on the graph 7 , this 
subgraph is a tree with probability one when N goes to infinity at fixed d. For finite N this 
probability is very near to 1 as soon as 



A(z) being an appropriate function. For large N this probability is given by 1 — 0(1/N). 

If z > 1 the nodes percolate and a finite fraction of the graph belongs to a single giant 
connected component. Cycles (or loops) do exist on this graph, but they have typically a 
length proportional to ln(iV). Also the diameter of the graph, i.e. the maximum distance 
between two points of the same connected component is proportional to ln(JV). The absence 
of small loops is crucial because we can study the problem locally on a tree and we have 
eventually to take care of the large loops (that cannot be seen locally) in a self-consistent 
way. i.e. as a boundary conditions at infinity. This problem will be studied explicitly in the 
next section for the ferromagnetic Ising model. 

7 The distance between two nodes i and k is the minimum number of links that we have to traverse in 
going from i to k. 



z = 2a 



(32) 




(33) 
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5.1 The Bethe Approximation 

Random graphs are sometimes called Bethe lattices, because a spin model on such a graph 
can be solved exactly using the Bethe approximation. Let us recall the Bethe approximation 
for the two dimensional Ising model. 

In the standard mean field approximation, one writes a variational principle assuming 
the all the spins are not correlated 0; at the end of the computational one finds that the 
magnetization satisfies the well known equation 

m = th(p Jzm) (34) 

where z = 4 on a square lattice (z = 2d in d dimensions) and J is the spin coupling (J > 
for a ferromagnetic model). This well studied equation predicts that the critical point (i.e. 
the point where the magnetization vanishes) is (3 C = 1/z. This result is not very exiting in 
two dimensions (where (3 C ~ .44) and it is very bad in one dimensions (where (3 C = oo). On 
the other end it becomes more and more correct when d — * oo. 

A better approximation can be obtained if we look to the system locally and we compute 
the magnetization of a given spin (er) as function of the magnetization of the nearby spins 
(rj, i = 1,4). If we assume that the spins r are uncorrelated, but have magnetization m, we 
obtain that the magnetization of the spin a (let us call it mo) is given by: 

m = £P m [r]th(/3J £ n) , (35) 

T i=l,4 

where 

1 -L 777 1 — 777 

Pm[r] = II Pmin), P m {r) = ——S r>1 + — — <5 T ,_i ■ (36) 

i=l,4 1 1 

The sum over all the 2 4 possible values of the r can be easily done. 
If we impose the self-consistent condition 

m (m) = m , (37) 

we find an equation that enables us to compute the value of the magnetization m. 

This approximation remains unnamed (as far as I know) because with a little more work 
we can get the better and simpler Bethe approximation. The drawback of the previous 
approximation is that the spins r cannot be uncorrelated because they interact with the 
same spin a: the effect of this correlation can be taken into account ant this leads to the 
Bethe approximation. 

Let us consider the system where the spin o has been removed. There is a cavity in the 
system and the spins r are on the border of this cavity. We assume that in this situation 
these spins are uncorrelated and they have a magnetization mc- When we add the spin a, 
we find that the probability distribution of this spin is proportional to 

J2P m c[r})exp[^JaJ2n) ■ (38) 
r y 1=1,4 J 

The magnetization of the spin a can be computed and after some simple algebra we get 

m = th{z arth[th(/3 J)m c ]} , (39) 
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with z — 4. 

This seems to be a minor progress because we do not know mc- However we are very 
near the final result. We can remove one of the spin Tj and form a larger cavity (two spins 
removed). If in the same vein we assume that the spins on the border of the cavity are 
uncorrelated and they have the same magnetization mc, we obtain 

m c = th{(z - l)arth[th(/3 J)m c }} - (40) 

Solving this last equation we can find the value of mc and using the previous equation we 
can find the value of m. 

It is rather satisfactory that in 1 dimensions (z = 2) the cavity equations become 

m c = th((3J)m c . (41) 

This equation for finite (3 has no non-zero solutions, as it should be. 

The internal energy can be computed in a similar way: we get that the energy density 
per link is given by 

th(pj)+m 2 c 

lmk ~ l + th((3J)m c [ ' 

and we can obtain the free energy by integrating the internal energy as function of (3. 
In a more sophisticated treatment we write the free energy as function of mc'- 

(3F(m c ) _ pv te ( mc ,) _ ^Fi ink (m c ) , (43) 

where Fu nk {mc) and F site (mc) are appropriate functions [T7| . This free energy is variational, 
in other words the equation 

OF 

dm c ^ 

coincides with the cavity equation (|3U|) . However for lack of space we will not discuss this 
interesting approach [T7| ITHj. 



5.2 Bethe lattices and replica symmetry breaking 

It should be now clear why the Bethe approximation is correct for random lattices. If we 
remove a node of a random lattice, the nearby nodes (that were at distance 2 before) are 
now at a very large distance, i.e. 0(\n(N)). In this case we can write 

and everything seems easy. 

This is actually easy in the ferromagnetic case where in absence of magnetic field at low 
temperature the magnetization may take only two values (±m). In more complex cases, 
(e.g. antiferromagnets) there are many different possible values of the magnetization because 
there are many equilibrium states and everything become complex (as it should) because 
the cavity equations become equations for the probability distribution of the magnetizations 

This case have been long studied in the literature and for historical reasons it is usu- 
ally said that the replica symmetry is spontaneously broken [T§1 12U|. Fortunately for the 
aims of this lecture we need only a very simple form of replica symmetry breaking (we are 
be interested to the zero temperature case) and we are not going to describe the general 
formalism. 
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6 Coloring a graph 



6.1 Basic definitions 

For a given graph G we would like to know if using q colors the graph can be colored in 
such a way that adjacent nodes have different colors. It is convenient to introduce the 
Hamiltonian 

H G = J2 A (hk)5 ai , ai , (46) 

i.k 

where Ac(i, k) is the adjacency matrix (i.e. 1 if the two nodes are connected, elsewhere) 
and the variable a may take values that go from 1 to q. This Hamiltonian describes the 
antiferromagnetic Potts model with q states. The graph G is colourable if and only if the 
ground state of this Hamiltonian (Eg) is zero. 

For large N on a random graph we expect that energy density 

^ = e(z) (47) 

does not depend on G: it should depends only on the average coordination number z of the 
graph. It can be proved that 

e(z) = for z < 1 , (48) 
e(z) oc \pz for z — > oo . 

Consequently there must be a phase transition at z c between the colorable phase e(z) = 
and the uncolorable phase e(z) ^ 0. Although it is possible to compute the function e(z) 
for all z in this lectures we address only to the simpler problem of computing the value of 

For q = 2 z c = 1, as it can be intuitively understood: odd loops cannot be coloured and 
large for z > 1 there are many large loops that are even and odd with equal probability. 
The q = 2 case is an antiferromagnetic Ising model on a random graph, i.e. a standard spin 
glass |22 ng. 

6.2 The cavity equations 

Let us start with the basic definitions. Let us consider a legal coloring (i.e all adjacent nodes 
have different colors). We take a node i and we consider the subgraph of nodes at distance 
d from a given node. Let us call B(i, d) the interior of this graph. With probability one 
(when N goes to infinity) this graph is a tree and the nodes at distance less or equal to d 
are the leafs of this tree (there may be also other leafs at shorter distance). In the future 
we shall assume that this graph is a tree and all the statements we shall do will be valid 
only with probability one when N goes to infinity. 
We ask the following questions: 

• Are there other legal colorings of the graph that coincide with the original coloring 
outside B(i,d) and differs inside B(i,d)l (Let us call the set of all these coloring 

CM).) 
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• Which is the list of colors that the node i may have in one of the coloring belonging 
to C(i,d)7 (Let us call this list L(i,d). This list depends on the legal configuration 
a, however for lightening the notation we will not indicate this dependence in this 
section.) 

The cardinality of L(i, d) increases with d so that L(i, d) must have a limit when d goes 
to infinity (still remaining in the region d << A(z) In(iV)). We call this limit L(i). We have 
not to forget that L(i) is fuzzy for finite N, but it becomes sharper and sharper when N 
goes to infinity. In other words L(i) is the list of all the possible colors that the site i may 
have if we change only the colors of the nearby nodes and we do not change the colors of 
faraway nodes fIS\ El] . 

Let us study what happens on a graph where the site % has been removed. We denote 
by A; a node adjacent to i and we call L(k; i, d) the list of the possible colors of the node k 
while the colors outside B(i, d) are fixed. The various nodes k do not interact directly and 
in this situation they can have a color independently from the color of the other nodes. 

In this situation it is evident that L(i, d+1) can be written as function of all the L(k; i, d). 
In order to write explicit equations it is convenient to indicate by an overbar (L) the list of 
the forbidden colors 8 . Let us us indicate by F(L) the list of colors that must be forbidden 
at the nodes that are adjacent to a node where the list of allowed colors is L. Barring the 
case where the list L is empty it is easy to obtain that: 

• F{L) = L if L contains only one element. 

• F(L) = (0 being the empty set) if L contains more than one element. 
With these preliminaries we have that 

L(i,d + l) = \jF(L(k;i,d)) , (49) 

k 

where k runs over all the nodes adjacent to i, and the union is done in a set theoretical 
sense. 

The previous formula can be easily transcribed into words. We have to consider all 
the neighbours (k) of the node i; if a neighbour may be colored in two ways, it imposes 
no constraint, if it can be colored in only one way, it forbids the node i to have its color. 
Considering all nearby nodes we construct the list of the forbidden colors and the list of the 
allowed colors is just composed by those colors that are not forbidden. 

The previous formula is exact (with probability one as usual). Also the next formula is 
valid with probabilism one: 

L(i,d) ={jF(L(k;i,d)) , (50) 

k 

because in most of the cases L(i, d+1) = L(i, d). 

If we do the limit d — > oo in any of the two previous formulae we get 

L(i) = {jF(L(h;i)). (51) 

k 

8 The list of forbidden colors is just the list of colors that do not belong to the list of possible colors. 
From the point of view of set theory the set of forbidden color is just the complement of the set of possible 
colors. Using this notation we obviously have L = L. 
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Of course everything is true with probability 1 when iV —> oo. 

A further simplification in the previous formulae may be obtained if we associate to a 
list L a variable uj that take values from to q defined as follow 

th 

• The variable oo is equal to i if the list contains only the i color. 

• The variable to is equal to if the list contains more than one color. 

Let us call Q(L) this mapping. 

In the nutshell we have introduced an extra color, white, and we say that a site is white 
if it can be colored in more than two ways without changing the colors of the far away sites 
|23 [ I24 [ I2"K]. The rational for introducing the variable u is that F(L) depends only on Q(L). 
The previous equations induces equations for the variables u>, i.e. 



In the same way as before we have to consider all the neighbours (k) of the node i; if a 
neighbour is white, it imposes no constraint, if it is colored it forbids the node % to have its 
color. Considering all nearby nodes we construct the list of the forbidden colors. If more 
than one color is not forbidden, the node is white, if only one color is not forbidden, the 
node has this color. 

The previous equation is just the generalization of eq. where we have the colors, 
white included, instead of the magnetizations. We have discrete, not continuos variables, 
because we are interested in the ground state, not in the behaviour at finite temperature. 
We have to write down the equivalent of the cavity equations, eq. i.e. for the quantities 
u>(i;l)). They are given by 



We can also say that is the information that is transmitted from the node i to the 

node / and it is computed using the information transmitted by all the nearby nodes, with 
the exclusion of I. 

The previous equation are called the belief equations (sometimes the strong belief equa- 
tions, in order to distinguish them from the weak belief equations that are valid at non-zero 
temperature). We can associate to any legal coloring a solution (or a quasi-solution) of 
the belief equations in a constructive way. Sometimes the solution of the belief equations 
is called a whitening [2211211, because some nodes that where colored in the starting legal 
configuration becomes white. 

The reader should notice that at this stage we can only say that the belief equations 
should be satisfied in a fraction of nodes that goes to 1 when N goes to infinity. However 
it is possible that the total number of nodes where the beliefs equations are not satisfied 
remains finite or goes to infinity with N and for this reason in general we can only say that 
these equations have quasi-solutions, not true solutions [23 121] • 




(52) 




(53) 
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7 Analytic computations 



7.1 An interpretation 

The main reasons for introducing the whitening is that each legal coloring has many other 
coloring nearby that differs only by the change of the colors of a small number of nodes. The 
number of these legal coloring that can be reached by a given coloring by making this kind 
of moves is usually exponentially large. On the other hands two colorings that differ only 
by the change of the colors of a small number of nodes correspond to the same whitening. 
Let us be more precise. For each coloring let us write 

oj(i\{a}) =u(i,d\{a}) for 1 « d « A{z) \n{N) , (54) 

where we have indicated in an explicit way that the whitening cj(z|{cr}) depends on the legal 
coloring {a}. 

We say that two whitening are equivalent if they differs in a fraction of nodes less that 
e(iV), where e(N) is a function that goes to zero when N — > oo. For finite N everything is 
fuzzy and depends on the precise choice of d and e. Let us suppose that for large N with 
probability one two whitening are equivalent or they differs in a large fraction of nodes. 

Generally speaking we have three possibilities. 

• For all the legal configurations the corresponding whitenings (u({a})) are such that 

{&}) = 0, i.e. all nodes are white. 

• For a generic legal configurations the corresponding whitening is non-trivial, i.e. for 
a finite fraction of the nodes u(i, {cr}) = 0. 

• The graph is not colorable and there are no legal configurations. 

In the second case we would like to know how many whitenings are there, how they 
differs and which are their properties, e.g. how many sites are colored. We shall see in the 
next section how these properties may be computed analytically and as byproduct we will 
find the value of z (z c ) that separates the colorable phase from the uncolorable phase. 

In the case where there are many whitenings one can argue that the set of all the 
legal configurations breaks in an large number of different disconnected regions that are 
called with many different names in the physical literature [3U1 EH EH| (states, valleys, 
clusters, lumps. . . ) . Roughly speaking the set of all the legal configurations can be naturally 
decomposed into clusters of proximate configurations, while configurations belonging to 
different clusters (or regions) are not close. The precise definition of these regions is rather 
complex 0]; roughly speaking we could say that two legal configurations belongs to the 
same region if they are in some sense adjacent, i.e. they belongs to a different region if their 
Hamming distance is greater than eiV. In this way the precise definition of these regions 
depends on e, however it can be argued that there is an interval in e where the definition 
is non-trivial and is independent from the value of e 9 . It is usually assumed that each 
whitening is associated to a different cluster of legal solutions. 

9 For a rigorous definition of these regions see |271 1281 OU) . 
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7.2 Surveys 



Let us consider the case (that we suppose to be non-empty) where there is a large number of 
non-equivalent whitening and we want to study the properties of this ensemble. At this end 
it is convenient to introduce the probability that for a generic whitening uj of the ensemble 
we have that u(i) = c; we will denote this probability P«(c). We obviously have 

E P ^ c ) = 1 • (55) 

c=0,q 

The quantities Pi{c) generalize the physical concept of magnetization. In the statistical 
analysis of the ensemble of configurations of an Ising model, the local variables may have 
only two values (±1) and 

P(±l) = 1 AP ± , (56) 

where Wj is the magnetization at the site %. Here q + 1 colors are possible (white included) 
and the magnetization is a q + 1-dimensional vector, normalized to 1. 

In order to do some computation we have to consider the also the two colors probabil- 
ities Pi t i(ci, c 2 ). We will assume a factorization hypothesis: for points % and / that are far 
away on the graph the probability Pij(ci, c 2 ) factorizes into the product of two independent 
probabilities 

Pi 1 i(ci,c 2 )=P i (ci)P,(c 2 ) , (57) 

neglecting corrections that go to zero (in probability) when N goes to infinity. This hypoth- 
esis in not innocent: there are many case where it is not correct; however we cannot discuss 
here this important and subtle point. In order to do some computation we have to consider 
the also the two colors probabilities Pi,z(ci,c 2 ). We will assume a factorization hypothesis: 
for points % and I that are far away on the graph the probability Pi,z(ci,c 2 ) factorizes into 
the product of two independent probabilities 

P,*(ci,c 2 ) = P(d)P(c 2 ) (58) 

neglecting corrections that go to zero (in probability) when N goes to infinity. This hypoth- 
esis in not innocent: there are many case where it is not correct; however we cannot discuss 
here this important and subtle point. 

A similar construction can be done with the cavity coloring and in this way we define 
the probabilities Pj.fc(c), where A; is a neighbour of i. These probabilities are called surveys 
(and they are denoted by s(i] k)) because they quantifies the probability distribution of the 
messages sent from the node i to the node k. 

Under the previous hypothesis the survey satisfy simple equations. Let us see one of 
them, e.g the one that relates P«(c) to the Pfc ; j(c). The final equations are simple, but the 
notation may becomes easily heavy, so let us write everything in an explicit way in the case 
where the point i has three neighbours k±, & 2 , &3- The generalization is intuitive. 

The formulae of the previous section tell us which is the color of the site % if we know 
the colors C\ = u(ki, i), c 2 = oj{k 2 ; i) and c 3 = u(k 3 ; i). This relation can be written as 

c = T 3 (ci,c 2 ,c 3 ) (59) 
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Therefore the probability distribution of c can be written as function of the probability 
distribution of the c$, that are supposed to factorize. We thus get 

P i( C ) = H ffe 1 ;i(cl)Pfe 2 ;i(C2)Pfe 3 ; i (C3)5 C)T 3 (ci!C2jC 3) . (60) 
Cl,C2,C3 

Similar formulae can be written for computing the surveys P^-i as function of other surveys, 
e.g. 

P iM( C ) = P k2;i( C 2) P k 3 ;i( C 3) 5 c,T 2 (c 2 ,c 3 ) ■ (61) 

C2,C3 

In this way we obtain the so called survey propagation equations [3UJ |3TJ E2| • 

The survey propagation equations always have a trivial solution corresponding to all sites 
white: -Pj(O) = 1 for all i. Depending on the graph there can be also non-trivial solutions 
of the survey equations. Let us assume that if such a solution exist, it is unique (also this 
point should be checked). In the next section we will find the statistical properties of this 
solution and we will identify the values of z where such a solution is present. 

7.3 An high level statistical analysis 

We are near the end of our trip. Let us look to equation eq.(|5I|l. The quantity depends 
on Pk 2 -i and Pk 3] i'- we indicate this relation by P^ = T 2 [Pk 2 -i, p k 3 ;i\- However if we neglect 
the large loops the quantities Pk 2 -,% and Pk 3 -,% do not depend from P^- 

If we consider the whole graph we can define the probability V[P], i.e. the probability 
that a given node has a probability P{c). If we consider only points with two neighbours 
we have 

V[P] = J dP[P!]dP[P 2 ]S [P - T 2 (P U P 2 ] (62) 

If we sum over all the possible coordination numbers with the Poisson distribution the final 
equation is 

P i P ] = E ~y / ■ ■ ■ dV \- P nW ~ W ■ ■ ■ P n)] (63) 

n n. J 

We arrived to an integral equation for the probabilities of the local probabilities (i.e. the 
surveys). This integral equation looks formidable and it is unlikely that it has an analytic 
solution. Fortunately often the solution of integral equations con be computed numerically 
without too much difficulties on present days computers and this is what happens in this 
case. 

One finds that there is a range Zd < z < zjj where the previous integral equation has a 
non-trivial solution and its properties can be computed f32] The fact that survey equation 
has a non-trivial solution does not imply that there are whitening that correspond to legal 
configurations so that at this stage we cannot compute the critical value of z. This problem 
will be solved in the next section. 

7.4 The complexity 

In the same way that the entropy counts the number of legal colorings, the complexity 
counts the number of different whitening; more precisely for a given graph we write 

^whitenings = exp(SG-) , (64) 



17 




where Eg is the complexity. 

We assume that for large N all graphs with the same average coordination number (z) 
have the same complexity density: 

E G « iVE(z) . (65) 

There is a simple way to compute the complexity. It consists in counting the variation in 
the number of whitenings when we modify the graph. At this end it is convenient to consider 
the complexity as function of N and of M (i.e. the total number of edges). Asymptotically 
we should have 

E(M, N) = NT, (jjpj • (66) 

There are two possible modifications we shall consider: adding one edge or adding one site 
(with the related edges). 

Let us suppose that we add an edge between the nodes i and k. Only those whitening 
in which the colors of two nodes is different may be extended to the larger graph. We thus 
find 

# whitenings (iV, M + 1) = # whitenings (iV, M) I 1 - J2 w »( c ) w fc( c ) I (67) 

V c=1 >9 I 

E(iV,M + l) = E(JV,M)+ln(l- ^ uo t {c)uo k {c) J = E(N, M) + AS edge (z) 

\ c=l,q J 

In the same way if we add a site (and automatically z links in the average) we get that 
the complexity increases by an addictive factor AE g ^ e that can be easily computed (it is 

18 



equal to the logarithm of the probability that the total number of different non-white colors 
of the nodes to which the new added node is linked is less that q). 

However if we want to change N by one unit at fixed z we have to add one site and only 
z/2 edges. Putting everything together we find that 




We know the probability distribution of the variables 00 from the previous analysis. We 
can now get the results for the complexity. It is shown in figure (1) for the case of the three 
coloring. 

The complexity jumps from to a finite value at zj = 4.42 and it decreases with increas- 
ing z and becomes eventually negative at z — 4.69. A negative value of S implies a number 
of whitenings less than one and it is interpreted as the signal there there are no whitening 
(and no legal configurations) with probability 1. Indeed an explicit computation shows that 
in the region where the complexity is negative a correct computation of the energy e(z) gives 
a non-zero (positive) result. The value where the complexity becomes zero is thus identified 
as the colorability threshold z c = 4.69. We have thus obtained [2T] the result we wanted for 
q = 3. Similar results may be obtained for higher values of q |2"T] . 

8 Conclusions 

The methods we have used in this lectures have been developed in the study of disordered 
systems, (i.e. spin glasses) firstly in the context of infinite range models (i.e. in the limit 
where z — ► 00). They have been extended to finite range models first at finite temperature 
and later at zero temperature (where notable simplifications are present). The concept of 
complexity emerged in the study of the spin glasses and it was also introduced in the study 
of glasses under the name of configurational entropy. The behaviour of the complexity as 
function of z is very similar to what is supposed to happen in glasses as function of (3. 

These methods have been successfully applied to combinatorial optimization in the case 
of the K satisfiability where our goal is to find a set of N boolean variables that satisfy aN 
logical clauses (of the OR type) containing K variables. Also in this case the satisfiability 
threshold jSHESl can be computed (for example we found [3U] that for K = 3 a c = 4.267). 

The research on these subjects is quite active. There are many problems that are open. 

• The extension to other models. This is interesting per se; moreover surprises may be 
present. 

• Verification of the self-consistency of the different hypothesis done and the identifica- 
tion of effective strategies in the case where they fail [3^1 EHj • 

• The construction of effective algorithms for finding a solution of the optimization 
problem. A first algorithm has been proposed |30J EU EH] and it has been later 
improved by adding a simple form of backtracking 38jJ. A goal is to produce an 
algorithm that for large N finds a solution with probability one on a random graph 
in a polynomial time as soon as z < z c (i.e. in a computer time that is bounded by 
iV A ( 2 )). Finding this algorithm is interesting from the theoretical point of view (it is 
not clear at all if such an algorithm does exist) and it may have practical applications. 
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• Last but not the least everything that we said up to now was derived using physicists 
stile. One should be able to transform the results derived in this way into rigorous 
theorems. It is very interesting that after a very long effort Talagrand using 
some crucial results of Guerra jlU], has been recently able to prove that a similar, but 
more complex, construction gives the correct results in the case of infinite range spin 
glasses (i.e. the Sherrington Kirkpatrick model) that was the starting point of the 
whole approach. It is relieving to know that the foundations of this elaborate building 
are sound. Some rigorous results have also been obtained for models with z [H]. 

The whole field is developing very fast and it is likely that in a few years we should have 
a much more clear and extended picture. 
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